Parts Of Speech Tagging for Indian Languages: A Literature Survey

نویسنده

  • Antony P J
چکیده

Part of speech (POS) tagging is the process of assigning the part of speech tag or other lexical class marker to each and every word in a sentence. In many Natural Language Processing applications such as word sense disambiguation, information retrieval, information processing, parsing, question answering, and machine translation, POS tagging is considered as the one of the basic necessary tool. Identifying the ambiguities in language lexical items is the challenging objective in the process of developing an efficient and accurate POS Tagger. Literature survey shows that, for Indian languages, POS taggers were developed only in Hindi, Bengali, Panjabi and Dravidian languages. Some POS taggers were also developed generic to the Hindi, Bengali and Telugu languages. All proposed POS taggers were based on different Tagset, developed by different organization and individuals. This paper addresses the various developments in POS-taggers and POS-tagset for Indian language, which is very essential computational linguistic tool needed for many natural language processing (NLP) applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on Parts of Speech Tagging for Indian Languages

Part of speech (POS) tagging is basically the process of automatically assigning its lexical category to each word according to its context and definition. Each word of sentence is marked in croups as corresponding to a particular part of speech like noun, verb, adjective and adverb. POS serves as a first step in natural language process applications like information extraction, parsing, and wo...

متن کامل

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

Part-of-Speech Tagging System for Indian Social Media Text on Twitter

Automatic part-of-speech (POS henceforth) is the primary necessities for any kind of Natural Language Processing (NLP) applications like disambiguate homonyms, text-to-speech processing, information retrieval, natural language parsing, information extraction etc. Here in this paper we are concentrating on POS tagging systems for Hindi and Bengali tweets. Although automatic POS tagging is a well...

متن کامل

Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey

The problem of tagging in natural language processing is to find a way to tag every word in a text as a particular part of speech, e.g., proper pronoun. POS tagging is a very important preprocessing task for language processing activities. This paper reports about the Part of Speech (POS) taggers proposed for various Indian Languages like Hindi, Punjabi, Malayalam, Bengali and Telugu. Various p...

متن کامل

An Approach for Grammatical Constructs of Sanskrit Language using Morpheme and Parts- of-Speech Tagging by Sanskrit Corpus

Sanskrit since many thousands of years has been the oriental language of India. It is the base for most of the Indian Languages. Statistical processing of Natural Language is based on corpora (singular corpus). Collection of texts of the written and spoken words is known as Language corpus, which is collected in an organized way, in electronic media for the purpose of linguistic research. It pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011